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Abstract 

Lot Quality Assurance Sampling (LQAS) surveys have become increasingly popular in global health care applications. 
Incorporating Bayesian ideas into LQAS survey design, such as using reasonable prior beliefs about the distribution of 
an indicator, can improve the selection of design parameters and decision rules. In this paper, a joint frequentist and 
Bayesian framework is proposed for evaluating LQAS classification accuracy and informing survey design parameters. 
Simple software tools are provided for calculating the positive and negative predictive value of a design with respect 
to an underlying coverage distribution and the selected design parameters. These tools are illustrated using a data 
example from two consecutive LQAS surveys measuring Oral Rehydration Solution (ORS) preparation. Using the 
survey tools, the dependence of classification accuracy on benchmark selection and the width of the 'grey region' are 
clarified in the context of ORS preparation across seven supervision areas. Following the completion of an LQAS 
survey, estimation of the distribution of coverage across areas facilitates quantifying classification accuracy and can 
help guide intervention decisions. 
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Introduction 

Lot Quality Assurance Sampling (LQAS), also referred to 
as sampling for attributes and acceptance sampling, has 
a long history of applications in industrial quality control 
[1,2]. In the past 20 years, simple LQAS binary classifica- 
tion surveys have become increasingly popular in global 
health care applications [3], In these LQAS surveys, an 
area is classified as having acceptable or unacceptable cov- 
erage of a health indicator by sampling from the region 
and counting the number of individuals with positive 
values of the indicator. 

LQAS is a statistical tool based on frequentist notions of 
misclassification error. The development of generic train- 
ing manuals has allowed survey designers to avoid the 
statistical principles behind LQAS, relying on cookbook 
formulas [4], Subsequently, decision-making via LQAS 
in public health has been criticized [4-6]. Specific criti- 
cisms of LQAS surveys include difficulty in interpreting 
the results and high false positive rates [5,7]. To address 
these criticisms, Olives and Pagano (2010, 2013) illus- 
trate that using a Bayesian approach facilitates quanti- 
fying the accuracy of LQAS classifications and illustrate 
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how to apply Bayesian LQAS (B-LQAS) designs in pub- 
lic health applications [8,9]. Myatt and Bennett (2008) 
propose monitoring transmitted HIV drug resistance in 
developing countries use using sequential LQAS survey 
designs with Bayesian interpretations [10]. Applications of 
B-LQAS in public health have not been applied frequently 
in practice. 

The idea of melding Bayesian and frequentist ideas to 
improve statistical inferences has been gaining in popu- 
larity, e.g. [11,12]. In LQAS surveys, no standard proto- 
col or toolset exists for assessing implications of design 
parameter selection on the classification accuracy. Using 
reasonable prior beliefs about the distribution of cover- 
age can inform and improve the selection of LQAS design 
parameters and help interpret survey results. This paper 
addresses merging Bayesian and frequentist ideas when 
designing LQAS surveys to provide perspective on LQAS 
classification accuracy. Tools for quantifying classification 
accuracy before and after the survey are proposed; corre- 
sponding software programs are provided for implement- 
ing these tools. After conducting the survey, the survey 
data can be aggregated to inform about the classification 
accuracy of the design. This paper is structured as fol- 
lows. First, LQAS surveys (as often implemented in public 
health applications) are described; and data from two con- 
secutive LQAS surveys in Nepal are introduced. Next, 
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limitations to a wholly Bayesian or frequentist design pro- 
cedure are discussed. To address these limitations, a sim- 
ple step-by-step process that incorporates Bayesian and 
frequentist concepts is proposed for designing LQAS sur- 
veys. Finally, post hoc measures of classification accuracy 
are proposed using the collected survey data; and these 
methods are applied to assess the classification accuracy 
of the Nepal LQAS survey design. 

LQAS survey design 

LQAS is a binary classification procedure for classifying 
the coverage of an indicator as acceptable or unacceptable 
within a supervision area (SA). In a classical LQAS survey, 
n individuals are randomly sampled from SA i. The num- 
ber of successes Xt (based on the indicator) are counted 
among the n individuals. The SA coverage is classified as 
acceptable if Xi > d and unacceptable if Xi < d. The 
key design question is how to select n and d such that the 
procedure has good classification properties. 

The choice of n and d is determined by two equations 
that control the risk profile of the classification procedure: 

P(X i <d\n,p i =p u ) < a 
P(X i >d\n,p i =pi) < p, 

where Xi ~ Binomial(n,pi). The risk a is the probability 
of classifying an area as unacceptable when pi = p u . The 
risk /J is the probability of classifying an area as accept- 
able when pi = pi. Areas with coverage between pi and 
p u are in the 'grey region'. Misclassification risks are not 
explicitly restricted for areas in the grey region; that is, 
the classification procedure is not designed to accurately 
distinguish between areas with true coverages lying in the 
grey region. 

The following steps are used to design an LQAS survey: 

1. Choose the binary indicator of interest and delineate 
the SAs. 

2. Select upper and lower threshholds pi and p u . 

3. Select risks a and j3 corresponding to the thresholds 
in Step 2. 

4. Iteratively solve for n and d in Equation 1 using the 
binomial cumulative mass function (typically with a 
software program). 

The parameters pi,p u > <x, and are selected based on 
subject-matter knowledge, often using the following guid- 
ance: an SA with true coverage at or above p u should 
be classified as unacceptable with low probability; and an 
SA with true coverage below pi should be classified as 
acceptable with low probability. The risks a and f} are the 
maximum allowable risks of misclassification at the upper 
threshold p u and lower threshold pi, respectively. This 
guidance for parameter selection may be sub-optimal, 
especially when it is expected that a high proportion of 



SAs will have true coverage in the grey region, between pi 
and p u - 

Example - ORS preparation in Nepal 

Throughout this paper, the survey described in [13] and 
the data provided therein are referenced as an illustrative 
example. The survey and data, as described in [13], are 
briefly summarized. LQAS was used to monitor whether 
mothers correctly prepared of Oral Rehydration Solution 
(ORS) in 7 supervision areas (SAs) in Nepal [13]. A base- 
line survey was conducted in January 1999 to monitor the 
coverage of the indicator "correct ORS preparation", and a 
follow up was conducted in January 2000. The goal of the 
January 2000 survey was to classify areas as achieving or 
failing to achieve the benchmark coverage target of 65%. 

Within an SA, n = 19 mothers were sampled, and Xi 
correctly prepared ORS. The decision rule d was selected, 
and, if Xi > d, the SA was classified as achieving the 
benchmark; otherwise, the SA was classified as failing to 
achieve the benchmark. In the January 2000 follow-up 
survey, the authors selected a lower threshold pi = 35% 
and an upper threshold p u = 65%; misclassification risks 
a and f3 were restricted to less than 10%. The final sam- 
ple size was n = 19 and decision rule d = 9. A subset 
of the survey results, as shown in [13], are reproduced in 
Table 1. 

Comparing classical and Bayesian LQAS designs 

One of the most common errors in statistical prac- 
tice is the misinterpretation of the p-value [14]. In 
hypothesis testing, p-values are often incorrectly ascribed 
Bayesian interpretations. Specifically, p-values are often 
(incorrectly) interpreted as the probability that the null 
hypothesis is true, given the data (versus the probabil- 
ity of observing data as extreme as what was observed, 
given the null hypothesis). Hence, the conditioning event 
is incorrectly reversed. 

Table 1 Nepal ORS data from baseline (June 1 999) and 



follow-up (January 2000) 



SA 


June 1999 


January 2000 


1 


7 


7 


2 


7 


9 


3 


12 


14 


4 


9 


13 


5 


11 


17 


6 


16 


19 


7 


8 


12 


Average 






coverage 


52.6% 


68.2% 


Number of mothers correctly 


preparing ORS out of 1 9 


are displayed for each of 



the seven supervision areas [1 3]. 
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While LQAS is not explicitly a hypothesis testing 
procedure, a similar error frequently occurs in LQAS 
applications: frequentist classification risks a and f} 
are ascribed Bayesian interpretations. This error occurs 
because Bayesian risks are typically more informative 
in decision-making [5,8]. For example, classical LQAS 
risks pertain to the probability of classifying an SA as 
achieving the benchmark (or failing to reach bench- 
mark achievement), given the true coverage probability 
(Equation 1). The probability that coverage truly exceeds 
(or does not exceed) the benchmark, given a classification 
of benchmark achievement, is typically a more interest- 
ing quantity for guiding decision making. Consequently, 
the risks a and are often incorrectly interpreted in this 
manner. 

To address the fact that Bayesian risks are more infor- 
mative for decision making, Olives and Pagano (2009) 
proposed using a Bayesian classification procedure (B- 
LQAS). The fundamental differences between Bayesian 
and classical LQAS designs are the reversal of the condi- 
tioning event and the conceptualization of pt as a random 
variable in Bayesian surveys (Table 2). In the B-LQAS 
design, the upper and lower thresholds, pi and p u , are 
again specified. Rather than specifying frequentist classi- 
fication risks a and /3, the authors use Bayesian classifica- 
tion risks; namely, ag is the probability that pt > p u , given 
that Xi < d, and fie is the probability that pt < pi given 
that Xi > d [8]. The Bayesian risks ag and fig are con- 
ditional on the classification decision. To calculate these 
classification risks, Bayesian designs require specification 
of one additional quantity, a prior distribution TtQ. The 
specified prior distribution it () is an estimate of the dis- 
tribution of pi, denoted TtQ. Heuristically, in a Bayesian 
framework, coverage pi is a random variable that fluctu- 
ates, and TtQ measures the range of feasible variability in 
Pi at the time of the survey. 

Conceptualization, and subsequently, estimation of this 
distribution TtQ is a difficult task. In industrial quan- 
tity control, a precise estimate of TtQ can be constructed 
by measuring the defect rate for a batch of goods (for, 
say, a production line) across many different batches. In 
public health, conceptualizing this prior distribution is 
less straightforward, because coverage rates fluctuate over 
time and space; hence, TtQ is never known prior to the 
survey. One possible definition of TtQ is the underlying 



Table 2 Reversal of the conditioning even in Bayesian and 
frequentist LQAS surveys 



Classical 


Bayesian 


a = P(Fail to achieve 


a B = Pip, > Pu|Fail to achieve 


benchmark|p; = p u ) 


benchmark) 


p = P(Achieve benchmark|p, = p/) 


p B = Pip/ < p,|Achieve 




benchmark) 



distribution of coverage across SAs at the time of the 
survey. This definition implicitly assumes that the region 
contains a large number of SAs; that SAs are independent; 
and that no prior knowledge about differences in coverage 
by SA exists. This definition of TtQ is used throughout the 
rest of this manuscript. 

Often, surveys are conducted to update knowledge 
about TtQ, which likely changes over time [9]. For instance, 
in the Nepal ORS coverage example, efforts are made 
to improve coverage over time; understanding temporal 
changes in coverage {i.e. temporal changes in TtQ) is a 
goal of the LQAS surveillance program. B-LQAS survey 
designs rely on correct a priori specification of this distri- 
bution, namely TtQ = TtQ, and are sensitive to the choice 
of TtQ [9]. Hence, the major limitation of Bayesian designs 
is the requirement of correct prior specification (which is 
unlikely in practice). 

Misspecification of TtQ in the design phase biases infer- 
ences, with the severity of bias depending on the degree 
of misspecification. To understand the root of this bias, 
note that the risks ag and fig for a classification proce- 
dure are a function of the specified prior it (); the collected 
data naturally does not inform the survey design or deci- 
sion rules, whereas the prior selection does. This prior 
it () (the estimate of the underlying distribution of cover- 
age in the population) is not utilized in the same manner 
as standard Bayesian analyses, where prior information 
is updated using collected data to construct a posterior 
distribution. For example, in Bayesian statistics, when lit- 
tle prior information is available, non-informative priors 
are chosen to reflect the lack of prior beliefs (allowing 
the data to dominate prior beliefs). In B-LQAS designs, 
non-informative priors are actually informative. A non- 
informative (flat) prior would suggest that all values of 
Pi are a priori equally likely, which is often a strong, 
and incorrect, assumption. The prior TtQ is specified 
before the survey occurs, and classification rules depend 
on the distribution TtQ. Hence, the use of the term 
"prior" for TtQ is an unfortunate misnomer for B-LQAS 
designs. 

If the specified prior does not reflect the distribution 
of coverage at the time of the survey, the risks ag and fig 
do not represent the true error rates for the classification 
procedure. Further, there is no way to a priori assess the 
accuracy of the prior TtQ; the only way to assess the accu- 
racy of the prior is through a post hoc estimation of the 
prior from the collected survey data. 

Design tools for an LQAS survey 

LQAS can be conceptualized as a population screening 
tool, where SAs are screened to examine if a bench- 
mark coverage level is achieved. For standard screening 
tools (or diagnostic tests), sensitivity and specificity mea- 
sure the true positive and negative rates of screening 
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tools; namely, these quantities answer the question "con- 
ditional on disease status, how often does the screening 
tool give the correct diagnosis?" As a patient who tests 
positive or negative based on the screening tool, the 
sensitivity and specificity are not relevant. Rather, posi- 
tive predictive value (PPV) and negative predictive value 
(NPV), which quantify the probability of correct diagnosis 
conditional on the test result, help the patient under- 
stand the likelihood of having the disease. PPV and NPV 
are typically calculated from sensitivity and specificity, 
using Bayes theorem and knowledge of the population 
prevalence. 

In classical LQAS surveys, risks a and j3 condition 
on the true population coverage, analogous to sensitivity 
and specificity which condition on disease status. Clas- 
sical LQAS survey designs always maintain the specified 
classification risks a and fi, regardless of the underly- 
ing coverage distribution TtQ. In classical LQAS surveys, 
operating characteristic (OC) curve and risk curves [9] 
are often plotted to summarize the design properties. The 
OC curve is defined as P(Xi > d\p); and the risk curve 
is defined as P(incorrect classification!/?), where classifi- 
cation is incorrect if X[ > d\p < p* or X[ < d\p > p*. 
The risks a and /3, the OC curve, and the risk curve are 
frequentist design summaries that condition on the true 
coverage, and therefore do not directly inform classifica- 
tion accuracy. That is, these measures do not inform how 
likely it is that an SA has achieved the benchmark, con- 
ditional on the classification decision (analogous to PPV 
and NPV); this measure is a function of the distribution of 
coverage, TtQ. 

Classification accuracy of a survey design measures how 
frequently the design correctly classifies coverage and per- 
tains to PPV and NPV, which are inherently Bayesian 
quantities. In order to define the PPV and NPV of a design, 
it is useful to first designate a programmatic target p* , 
denoting the cut-off for correct versus incorrect classifi- 
cations [4]. In the Nepal example, selecting p* = 0.65 is a 
reasonable choice; with p* = 0.65, classifying areas with 
true coverage in the grey region (35%-65%) as achieving 
the benchmark of 65% coverage is an error. Specifica- 
tion of p* is not mandatory for designing a survey, but is 
essential for evaluating the classification accuracy of the 
survey. 

Throughout this article, the following definitions of PPV 
and NPV are used: 

PPV = .P(high coverage | classified as high) 

= P(pi > p*\Xi > d) = P(Xi > d\ Pi > p*) 
x P( Pi > p*)/P(Xi > d) 

= P(Xi>d\pi>p*)( Tt(p)dp/f P(Xi>d\p)7i(p)dp 

Jf* Jp* 



NPV = P(low coverage | classified as low) 

= P( Pi < p*\X t <d)= P{Xi < d\ Pi < p*) 
x P( Pi < p*)/P(Xi < d) 

= P(X i <d\p i <p*)[ P jt(p)dp/ T P(X i <d\p)jz(p)dp 
Jo Jo 

(2) 

PPV and NPV are based on the unknown true underly- 
ing distribution for p u TtQ. Estimates of these quantities, 
denoted PPV and NPV, are calculated with respect to 
the specified prior distribution by substituting ft () for TtQ 
into Equation 2. While TtQ will not be known explicitly 
for public health applications, a range of potential distri- 
butions could likely be elicited from program managers 
before conducting a survey (i.e. specify various different 
values of jt() by considering feasible ranges for pi). 

The proposed design tools estimate the classification 
accuracy (PPV and NPV) of a design for a range of dis- 
tributions {icQ}. Simple calculations of PPV and NPV for 
various specifications of 7T() provide a sensitivity analy- 
sis for the classification accuracy of the survey design. 
For instance, if most SAs have true coverage in the grey 
region, the survey will either have very poor PPV or NPV. 
When selecting prior distributions, ftQ should be chosen 
to reflect current beliefs as closely as possible; using multi- 
ple plausible values of ft () is best unless the actual value of 
TtQ is known with some certainty. Given that surveys are 
often conducted to measure changes in the distribution 
of coverage over time, substantial uncertainty will usually 
exist a priori in estimates of TtQ. 

The steps for designing a classical LQAS survey were 
described in the above section. The design parameters (p u , 
pi, p*, a and ft) can be selected by evaluating the classifi- 
cation accuracy of the design. Specifically, to select these 
design parameters, the following steps are proposed: 

1. Select a programmatic target p*. Then, select pi, p u , 
a, and j3, using subject-matter knowledge and 
keeping p* in mind. 

2. Determine n and d corresponding to this choice. 

3. Plot risk curve [9] for the survey design. 

4. Construct multiple plausible estimates of TtQ, using 
subject-matter knowledge and historical data. 
Calculate P(p < p*) and P(pi < p < p u ) for the 
distribution, to gauge how much of the mass of the 
prior lies above/below the target p* and within the 
grey region (pi,p u )- 

5. Calculate the PPV and NPV for the each specified 
prior f}Q. Also, calculate the probability of true 
coverage lying within the grey region, given the 
classification. 

6. Return to step (1) if the design parameters do not 
provide sufficiently accurate classifications; consider 
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reducing the misclassification risks or narrowing the 
grey region. 

The R package, lqasdesign, written in R version 
2.15.2 [15], contains functions for designing LQAS sur- 
veys and evaluating the designs (the R package is 
Additional file 1). The package includes functions for 
calculating the sample size and decision rule for an 
LQAS design (Step 2 above); and functions for conduct- 
ing sensitivity analyses to examine the design parame- 
ter choices for different prior specifications (Steps 2-4). 
To facilitate use of these tools, the package contains 
an interactive web-application for survey design, con- 
structed using the shiny package from Rstudio [16]. 
Screen-shots and simple instructions for using the appli- 
cation are in Appendix B in Additional file 2. Instruc- 
tions for using the package are in the package manual, 
accessed by typing vignette ( "manual" ,package= 
"lqasdesign") in R. 

Eliciting various prior distributions in step (4) is non- 
trivial. Restricting to families of distributions with support 
between 0 and 1 is preferable when modeling propor- 
tions. The unimodal Beta distribution is implemented in 
the R package for simplicity. The Beta distribution is char- 
acterized by two parameters, a and b, and is denoted 
B(a,b). The mean of a B{a,b) distribution is a/ {a + b), 
and the standard deviation is also a function of a and b. 
More properties of the Beta distribution are discussed in 
Appendix A in Additional file 2. Expanding the R func- 
tions to accommodate other prior distributions, such as a 
mixture of Beta distributions, is of interest in future work. 

Using the provided R programs, the user can specify a 
mean and standard deviation for pt to obtain a Beta prior. 
The Beta distribution is asymmetrical and can be highly 
skewed (making the standard deviation more difficult to 
specify). The R package contains functions for plotting the 
selected prior and calculating the probabilities in Step 4 of 
the design algorithm, to ensure that the user's prior beliefs 



adequately match the shape of the selected distribution. 
The user can also input data from past surveys (across 
multiple SAs) and find the best-fitting Beta distribution 
for the data. The selected prior(s) should represent the 
range of current beliefs about the distribution of coverage 
in a region, specified using existing data, expert opinions, 
or both. In the section below, a step-by-step example of 
choosing various prior distributions using baseline data is 
considered. 

Application: ORS coverage survey design properties 

The evaluation tools are illustrated using the design of the 
January 2000 follow-up Nepal ORS coverage survey, with 
n = 19 and d = 9. The classification risks a and /J are 
both 0.087. Hence, the probability of failing to achieve the 
benchmark when p > p u is less than 0.087; and the proba- 
bility of achieving the benchmark when p < pi is less than 
0.087. The risk curves with p* = 0.35 and p* = 0.65 are 
plotted in Figure 1. 

Next, Bayesian summary measures for the survey design 
are examined, after specifying an underlying coverage dis- 
tribution for pi. Several different tc() distributions are 
considered: 5(1, 1), a flat, "non-informative" prior (cov- 
erage of an SA has an equal probability of taking on any 
value between 0 and 1); 5(9.6,8.7), a Beta distribution 
consistent with the information observed at the first sur- 
vey in January 1999; and 5(4.3, 2.1), a Beta prior consistent 
with the idea that mean coverage shifted by 15% from 
January 1999 to January 2000, but the standard deviation 
remained the same. PPV and NPV are sensitive to both the 
mean and variance of jtQ. Two additional priors are con- 
sidered, choosing the same mean as the 5(4.3, 2.1) prior, 
but reducing the standard deviation by half (5(19.4, 9.3)) 
and raising the standard deviation by 25% (5(2.5, 1.2)) to 
assess sensitivity of the design properties to the spread of 
the distribution. Heuristically, the survey properties will 
be different if the pis are constrained to a narrow range. 
These prior distributions are plotted in Figure 2. 
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Positive predictive value and negative predictive value 
are calculated for the design, assuming that pi is a random 
variable generated from the prior distribution. Program- 
matic thresholds p* = 0.35 and p* = 0.65 are consid- 
ered; the probability that true coverage lies in the grey 
region (pi to p u ) is also calculated. Results are displayed in 
Table 3. 

These calculations help clarify role of p* in interpret- 
ing LQAS surveys. Choosing p* = .65, the survey has 
excellent NPV and mediocre PPV. Hence, areas classified 
as failing to achieve the benchmark likely have coverage 
less than 65%. Areas achieving the benchmark may or may 
not have coverage greater than 65%. Given that p* = 65% 
is likely the most contextually relevant threshold for the 
application, program managers would know to interpret 
'benchmark achievement' with caution or to construct a 
different design by changing the thresholds pi and p u . If 
instead the benchmark were p* = .35, the survey has 
excellent PPV and mediocre to poor NPV, depending on 
the prior. In this case, if an area is classified as achieving 
the benchmark, it is likely that coverage is at least 35%. If 
an area is classified as failing to achieve the benchmark, 
it is unclear whether coverage is greater than or less than 
35%. 

Lastly, examining the grey region properties clarifies 
that the probability of an area having true coverage in the 
grey region is non-negligible for all of the prior selections. 
This result is not surprising, because the grey region spans 
30% of the support of pi. Narrowing the grey region would 
improve classification accuracy at the cost of an increased 
sample size. 



" B(9.6, 8.7) 

B(4.3, 2.1) 

B(19.4, 9.3) 

B(2.5, 1.2) 




I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 



Coverage 

Figure 2 Prior Distributions. Prior distributions used in the ORS 
survey design sensitivity analysis. The histogram represents a plot of 
the actual data across the 7 SAs in January 1 999. 



Appendix C in Additional file 2 contains R code for 
replicating all of the analyses in this manuscript. 

Post-survey tools for LQAS surveys 

LQAS data are often summarized by presenting a con- 
fidence interval for coverage, aggregating over all SAs; 
this confidence interval provides a measure of uncer- 
tainty associated with the overall coverage in the region. 
Additionally, the number of SAs classified as accept- 
able/unacceptable is usually presented. These standard 
summary measures do not inform the classification accu- 
racy of the design. Understanding classification accuracy 
of the design procedure can help determine how to allo- 
cate resources. Estimation of the coverage distribution ttQ 
following the survey can help to a posteriori measure the 
accuracy of the classifications. 

As an example, hypothetically suppose that, out of 10 
SAs, 5 achieve the benchmark and 5 do not. Consider two 
different extreme scenarios: 1) the coverage distribution 
ttQ is bimodal, and 0 areas have true coverage between 
35% and 65%; and 2) 7r() is unimodal and all 10 areas have 
true coverage between 35% and 65%. Using a standard 
LQAS survey protocol, it is unclear how to distinguish 
between scenario 1 or 2 for decision-making. For surveys 
like the Nepal survey, with a grey region spanning almost 
a third of the support of pi, scenario 2 is likely common. 
Characterizing the distribution of coverage across SAs, 
TtQ, and incorporating this information into the decision- 
making process can improve the efficacy of LQAS as a 
monitoring and evaluation tool. 

When LQAS surveys are conducted in many SAs within 
a region, the underlying distribution of coverage across 
SAs in the region, TtQ, can be estimated and used to calcu- 
late the expected proportion of SAs with coverage below 
p* and with coverage in the grey region, pi to p u . To esti- 
mate TtQ, again assume that the true prevalence in an SA, 
Pi, is a random variable drawn from rcQ. 

The distribution TtQ is estimated using several 
approaches: assuming a parametric Beta distribution; 
using a simple non-parametric histogram; and using 
kernel density estimation [17,18] for non-parametric 
smoothing. For a review on density estimation, see [19] 
and references within. For the kernel density estimator, 
the default bandwidth h * m~ 3 is used in the R program 
and throughout the analysis, where h is the bandwidth 
using Silverman's rule of thumb [20] and m is the number 
of SAs. This bandwidth is selected to prioritize unbiased- 
ness (over variance reduction) in estimation and avoid 
oversmoothing [21,22]. Following density estimation, 
the probabilities P(pt < p*) and P(pi < pi < p u ) are 
estimated, with corresponding standard errors estimated 
using bootstrap resampling [23]. 

By estimating ttQ from the data, program managers can 
learn important properties about the distribution of pi 
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Table 3 Properties of the survey designs for various prior specifications 



*0 


S(p*) 


p* = .35 
PPV 


NPV 


p* = .65 
S(p*) PPV 


NPV 


P grey 


p s (.35, .65) 
PPV 


NPV 


B(1,1) 


0.650 


0.991 


0.692 


0.350 0.692 


0.991 


0.300 


0.300 


0.300 


B(9.6, 8.7) 


0.937 


0.995 


0.139 


0.143 0.243 


0.986 
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1.000 
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0.832 


0.366 
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0.831 
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0.908 


0.997 


0.381 
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0.316 
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= P(p € f.35,.65)), PPV = P(p, <p< Pu \Xi>9) and NPV = P(p, < p < p„| X, < 9). 



to inform intervention decisions. A bimodal distribution 
implies that some areas are performing well, while others 
are performing poorly. A unimodal distribution centered 
in the grey region suggests that area-specific interventions 
might not be as effective as a region-wide intervention, 
and binary classifications should not be over-interpreted. 
Further, the estimated density can guide survey design in 
the next round of surveillance. In the Nepal survey design 
application, data from the first round of surveillance were 
used to construct an array of prior distributions, first 
estimating the density from baseline data and then shift- 
ing the mean and varying the standard deviation of this 
estimated density. 

When the number of subjects sampled per SA and 
number of SAs are both large, the nonparametric den- 
sity estimators are unbiased, and the parametric estimator 
is unbiased if the Beta model is correct. In finite sam- 
ples, density estimators are biased. In Appendix D in 
Additional file 2, finite sample bias and standard errors 
are evaluated for P(pi < pi), P(pi > p u ), and P(pi < 
Pi < p u ) using a simulation study. The finite sample bias is 
non-negligible and varies depending on the mode of esti- 
mation. Estimating the probabilities P{pi < pi), P(pi > 
p u ), and P(pi < pi < pu) can inform classification accu- 
racy, but the estimated probabilities and standard errors 
may exhibit substantial bias in small sample sizes. 

Application: ORS coverage density 

The survey conducted in January 2000 contained only 
7 SAs and 19 people per SA. Therefore, all of the pro- 
posed density estimators are biased, and it is important 
to avoid over-interpreting these results. The estimates of 
7T() using the parametric Beta distribution, the kernel den- 
sity estimator, and the crude histogram are plotted in 
Figure 3. 

The estimated percent of areas with true coverage in 
the grey region is 42.9% (sd = 18.6%) using the crude 
histogram; 36.9% (sd =14.0%) using kernel density estima- 
tion; and 34.2% using the Beta distribution (standard error 
not available due to small sample size and lack of estimator 
convergence in bootstrap samples). While these estimates 
are likely somewhat biased, the results suggest that a high 



proportion of SAs could have true coverages in the grey 
region. 

Conclusion 

In this paper, a simple evaluation framework for LQAS 
survey designs is developed by melding Bayesian and 
frequentist ideas. The suggested tools are implemented 
within the free software program R; detailed instructions 
and a user-friendly web-based application should facili- 
tate the use of these tools. The practicality of LQAS lies 
in its simplicity; the entire design is determined by four 
parameters: pi,p u >oi, and P, and can be evaluated with 
respect to a programmatic target p*. However, arbitrary 
specification of the design parameters without consid- 
ering concepts such as positive and negative predictive 
value is potentially dangerous. The implications of choos- 
ing a grey region of width 30% are clearer following these 
calculations. When design properties are less than ideal, 
abandoning binary classification in favor of a three-tiered 
[24] or double sampling approach [25] is a viable option. 
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Figure 3 Density Estimates. Underlying coverage density estimates 
following data collection in the 7 SAs in January 2000. 
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In this paper, Bayesian survey designs are discussed 
based on the classification risks ag and Pb> to facili- 
tate contrasting the Bayesian and classical survey designs. 
Alternative Bayesian designs (e.g. using different loss func- 
tions) are discussed in [8,9]. Due to the subjective specifi- 
cation of the prior and potential for bias, purely Bayesian 
designs can perform poorly in practice unless the prior is 
known with certainty. 

Public health applications of LQAS typically use simple 
binary classification for decision-making, though other, 
non-binary types of outcomes have been explored in pub- 
lic health. Olives et. al (2012) construct classification 
designs for ordinal variables with more than two cate- 
gories [24]. Hypergeometric models are also used in prac- 
tice when population sizes are small e.g. [25,26]. Future 
work should explore developing LQAS design diagnostic 
tools for these different outcome models, with appropri- 
ate prior selection; as well as explore developing LQAS 
designs and analysis tools for other types of outcomes, 
such as the Poisson model for rates and the normal model 
for means. 

This paper is intended as a first step toward developing 
more sophisticated tools for LQAS survey design evalu- 
ation using Bayesian concepts. LQAS designs are being 
extended for more complex applications [10,27,28]. As 
the complexity of these surveys increases, training materi- 
als and additional survey evaluation tools that encourage 
program managers to understand the entire LQAS proba- 
bilistic framework will become increasingly valuable. 

Additional files 



Additional file 1 : Iqasdesign R package. Additional file 2 contains the 
lqasdesign R package. 

Additional file 2: Appendices. Appendix A contains a description of the 
Beta distribution; Appendix B contains a description of the web-based R 
application; Appendix C contains complete R code for reproducing the 
analysis in the manuscript; and Appendix D contain a simulation study 
assessing properties of the density estimators. 
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